Kotlin For Apache Spark preview 2 您所在的位置:网站首页 kotlin dataframe Kotlin For Apache Spark preview 2

Kotlin For Apache Spark preview 2

#Kotlin For Apache Spark preview 2 | 来源: 网络整理| 查看: 265

Ecosystem Kotlin for Apache Spark: One Step Closer to Your Production Cluster Pasha Finkelshteyn Pasha Finkelshteyn

We have released Preview 2 of Kotlin for Apache Spark. First of all, we’d like to thank the community for providing us with all their feedback and even some pull requests! Now let’s take a look at what we have implemented since Preview 1?

Scala 2.11 and Spark 2.4.1+ support

The main change in this preview is the introduction of Spark 2.4 and Scala 2.11 support. This means that you can now run jobs written in Kotlin in your production environment.

The syntax remains the same as the Apache Spark 3.0 compatible version, but the installation procedure differs a bit. You can now use sbt to add the correct dependency, for example:

libraryDependencies += "org.jetbrains.kotlinx.spark" %% "kotlin-spark-api-2.4" % "1.0.0-preview2"

Of course, for other build systems, such as Maven, we would use the standard:

org.jetbrains.kotlinx.spark kotlin-spark-api-2.4_2.11 1.0.0-preview2

And for Gradle we would use:

implementation 'org.jetbrains.kotlinx.spark:kotlin-spark-api-2.4_2.11:1.0.0-preview2'

Spark 3 only supports Scala 2.12 for now, so there is no need to define it.

You can read more about dependencies here.

Support for the custom SparkSessionBuilder

Imagine you have a company-wide SparkSession.Builder that is set up to work with your YARN/Mesos/etc. cluster. You don’t want to create it, and the “withSpark” function doesn’t give you enough flexibility. Before Preview 2, you had only one option – you had to rewrite the whole builder from Scala to Kotlin. You now have another one, as the “withSpark” function accepts the SparkSession.Builder as an argument:

val builder = // obtain your builder here withSpark(builder, logLevel = DEBUG) { // your spark is initialized with correct config here } Broadcast variable support

Sometimes we need to share data between executor nodes. Occasionally, we need to provide executor nodes with dictionary-like data without pushing it to each executor node individually. Apache Spark allows you to do this using broadcasting variable. You were previously able to do this using the “encoder” function. Still, we are aiming to provide an API experience that is as consistent as possible with that of the Scala API. So we’ve added explicit support for the broadcasting of variables. For example:

// Broadclasted data should implement Serializable data class SomeClass(val a: IntArray, val b: Int) : Serializable // some another code snipped val receivedBroadcast = broadcast.value // broadcasted value is available

We’d like to thank Jolan Rensen for the contribution.

Primitive arrays support for Spark 3

If you use arrays of primitives in your data pipeline, you may be glad to know that they now work in Kotlin for Apache Spark. Arrays of data (or Java bean) classes were supported in Preview 1, but we’ve only just added support for primitive arrays in Preview 2. Please note that this support will work correctly only if you’re using particular Kotlin primitive array types, such as IntArray, BooleanArray, and LongArray. In other cases, we propose using lists or alternative collection types.

Future work

Now that Spark 2.4 is supported, our targets for Preview 3 will be mainly to extend the Column API and add more operations on columns. Once Apache Spark with Scala 2.13 support is available, we will implement support for Scala 2.13, as well.

The awesome support introduced in Preview 2 makes now a great time to give it a try! The official repository and the quick start guide are excellent places to start.

apache spark Kotlin newsletter spark Share Facebook Twitter Linkedin Prev post Down Dog Case Study: Building 500K Subscribers App with Kotlin Multiplatform MobileServer-side With Kotlin Webinar Series, Vol 2 Next post Subscribe to Blog updates

Thanks, we've got you!



【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有